Construction of English-Bodo Parallel Text Corpus for Statistical Machine Translation
نویسندگان
چکیده
منابع مشابه
Catalan-English statistical machine translation without a parallel corpus
This paper presents a full experiment on large-vocabulary Catalan-English statistical machine translation without an English-Catalan parallel corpus, in the context of the debates of the European Parliament. For this, we make use of an English-Spanish European Parliament Proceedings parallel corpus and a Spanish-Catalan general newspaper parallel corpus, both of which of more than 30 M words. G...
متن کاملUM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natural language processing systems, especially for Statistical Machine Translation (SMT). However, most existing parallel corpora to Chinese are subject to in-house use, while others are domain specific and limited in size. To a certain degree, this limits the SMT research. This paper describes the ...
متن کاملEuroparl: A Parallel Corpus for Statistical Machine Translation
We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web1. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translation (SMT). We trained SMT systems for 110 language pairs, which reveal interesting clues into the ...
متن کاملSegmentation and alignment of parallel text for statistical machine translation
We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a stochastic generative process over text translation pairs, and derive two different alignment procedures based on the underlying alignment model. The first procedure is a now-standard dynamic programming alignment model...
متن کاملAligning Turkish and English Parallel Texts for Statistical Machine Translation
This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical components of words such as morphemes, we have converted our parallel texts to a morphemic representation and then used standard word alignment a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Natural Language Computing
سال: 2018
ISSN: 2319-4111,2278-1307
DOI: 10.5121/ijnlc.2018.7509